-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine complex variants and translocations #736
Conversation
Thank you! Yes to all three questions, I think. |
16. `16-RefineComplexVariants`: Complex variant and translocation refinement | ||
17. `17-JoinRawCalls`: Combines unfiltered calls (from step 5) across batches | ||
18. `18-SVConcordance`: Annotates variants with genotype concordance against raw calls | ||
19. `19-FilterGenotypes`: Performs genotype filtering to improve precision and generates QC plots | ||
20. `20-AnnotateVcf`: Cohort VCF annotations, including functional annotation, allele frequency (AF) annotation, and AF annotation with external population callsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to rebase since the FilterGenotypes documentation is merged now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased yesterday after you merged - I didn't add these lines from scratch, just updated the numbering
for info in cpx_sv: | ||
breakpoints = info[0] | ||
if info[1][0] == 'ab' and info[1][1] == 'b^': # delINV | ||
common_1 = ['tabix', 'PE_metrics', breakpoints[0] + ':' + str(breakpoints[1] - flank_back) + '-' + str(breakpoints[1] + flank_front), '| grep', 'sample', '| awk', "'{if ($1==$4", '&&', '$3=="+" && $6=="+"', '&&', '$5>' + str(breakpoints[3] - flank_back), '&&', '$5<' + str(breakpoints[3] + flank_front), ") print}' | sed -e 's/$/\\t", info[2], "/'", '>>', pe_evidence] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☠️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...
wdl/CleanVcfChromosome.wdl
Outdated
# disk is cheap, read/write speed is proportional to disk size, and disk IO is a significant time factor: | ||
# in tests on large VCFs, memory usage is ~1.0 * input VCF size | ||
# the biggest disk usage is at the end of the task, with input + output VCF on disk | ||
Int cpu_cores = 2 # speed up compression / decompression of VCFs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all relevant or just copy+paste? I'm not sure if multiple cores helps unless they're explicitly requested in the CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was just copy/paste, I can change to 1 CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge assuming FilterGenotypes tests complete.
ad95927
to
6803944
Compare
Updates
This PR updates and integrates the ManualReview workflow originally authored by @xuefzhao and previously reviewed by @cwhelan into the main GATK-SV pipeline.
Questions
.
for mCNVs - is this ok? [Yes - left as-is]For the future
Testing
Ongoing testing [Complete]